45 research outputs found

    Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining Representations for Cyberbullying Classification

    Full text link
    Cyberbullying is a pervasive problem in online communities. To identify cyberbullying cases in large-scale social networks, content moderators depend on machine learning classifiers for automatic cyberbullying detection. However, existing models remain unfit for real-world applications, largely due to a shortage of publicly available training data and a lack of standard criteria for assigning ground truth labels. In this study, we address the need for reliable data using an original annotation framework. Inspired by social sciences research into bullying behavior, we characterize the nuanced problem of cyberbullying using five explicit factors to represent its social and linguistic aspects. We model this behavior using social network and language-based features, which improve classifier performance. These results demonstrate the importance of representing and modeling cyberbullying as a social phenomenon.Comment: 12 pages, 5 figures, 22 tables, Accepted to the 14th International AAAI Conference on Web and Social Media, ICWSM'2

    Flying, phones and flu: Anonymized call records suggest that Keflavik International Airport introduced pandemic H1N1 into Iceland in 2009

    Get PDF
    Publisher's version (útgefin grein)Background Data collected by mobile devices can augment surveillance of epidemics in real time. However, methods and evidence for the integration of these data into modern surveillance systems are sparse. We linked call detail records (CDR) with an influenza-like illness (ILI) registry and evaluated the role that Icelandic international travellers played in the introduction and propagation of influenza A/H1N1pdm09 virus in Iceland through the course of the 2009 pandemic. Methods This nested case-control study compared odds of exposure to Keflavik International Airport among cases and matched controls producing longitudinal two-week matched odds ratios (mORs) from August to December 2009. We further evaluated rates of ILI among 1st- and 2nd-degree phone connections of cases compared to their matched controls. Results The mOR was elevated in the initial stages of the epidemic from 7 August until 21 August (mOR = 2.53; 95% confidence interval (CI) = 1.35, 4.78). During the two-week period from 17 August through 31 August, we calculated the two-week incidence density ratio of ILI among 1st-degree connections to be 2.96 (95% CI: 1.43, 5.84). Conclusions Exposure to Keflavik International Airport increased the risk of incident ILI diagnoses during the initial stages of the epidemic. Using these methods for other regions of Iceland, we evaluated the geographic spread of ILI over the course of the epidemic. Our methods were validated through similar evaluation of a domestic airport. The techniques described in this study can be used for hypothesis-driven evaluations of locations and behaviours during an epidemic and their associations with health outcomes.Icelandic Centre for Research Award #152620-051, an Emory University Research Council Award, NSF CAREER Award #1553579, a Leverhulme Early Career Fellowship and a hardware donation from NVIDIA Corporation."Peer Reviewed

    Cell-phone traces reveal infection-associated behavioral change

    Get PDF
    To access publisher's full text version of this article, please click on the hyperlink in Additional Links field or click on the hyperlink at the top of the page marked DownloadEpidemic preparedness depends on our ability to predict the trajectory of an epidemic and the human behavior that drives spread in the event of an outbreak. Changes to behavior during an outbreak limit the reliability of syndromic surveillance using large-scale data sources, such as online social media or search behavior, which could otherwise supplement healthcare-based outbreak-prediction methods. Here, we measure behavior change reflected in mobile-phone call-detail records (CDRs), a source of passively collected real-time behavioral information, using an anonymously linked dataset of cell-phone users and their date of influenza-like illness diagnosis during the 2009 H1N1v pandemic. We demonstrate that mobile-phone use during illness differs measurably from routine behavior: Diagnosed individuals exhibit less movement than normal (1.1 to 1.4 fewer unique tower locations; [Formula: see text]), on average, in the 2 to 4 d around diagnosis and place fewer calls (2.3 to 3.3 fewer calls; [Formula: see text]) while spending longer on the phone (41- to 66-s average increase; [Formula: see text]) than usual on the day following diagnosis. The results suggest that anonymously linked CDRs and health data may be sufficiently granular to augment epidemic surveillance efforts and that infectious disease-modeling efforts lacking explicit behavior-change mechanisms need to be revisited. Keywords: call detail records; disease; influenza; outbreak; surveillance.Alan Turing Institute Engineering and Physical Sciences Research Council EP/N510129/1 UK Research & Innovation (UKRI) Medical Research Council UK (MRC) European Commission National Institute for Health Research (NIHR) Health Protection Research Unit in Evaluation of Interventions at the University of Brist

    Development of a new barcode-based, multiplex-PCR, next-generation-sequencing assay and data processing and analytical pipeline for multiplicity of infection detection of Plasmodium falciparum.

    Get PDF
    BACKGROUND Simultaneous infection with multiple malaria parasite strains is common in high transmission areas. Quantifying the number of strains per host, or the multiplicity of infection (MOI), provides additional parasite indices for assessing transmission levels but it is challenging to measure accurately with current tools. This paper presents new laboratory and analytical methods for estimating the MOI of Plasmodium falciparum. METHODS Based on 24 single nucleotide polymorphisms (SNPs) previously identified as stable, unlinked targets across 12 of the 14 chromosomes within P. falciparum genome, three multiplex PCRs of short target regions and subsequent next generation sequencing (NGS) of the amplicons were developed. A bioinformatics pipeline including B4Screening pathway removed spurious amplicons to ensure consistent frequency calls at each SNP location, compiled amplicons by SNP site diversity, and performed algorithmic haplotype and strain reconstruction. The pipeline was validated by 108 samples generated from cultured-laboratory strain mixtures in different proportions and concentrations, with and without pre-amplification, and using whole blood and dried blood spots (DBS). The pipeline was applied to 273 smear-positive samples from surveys conducted in western Kenya, then providing results into StrainRecon Thresholding for Infection Multiplicity (STIM), a novel MOI estimator. RESULTS The 24 barcode SNPs were successfully identified uniformly across the 12 chromosomes of P. falciparum in a sample using the pipeline. Pre-amplification and parasite concentration, while non-linearly associated with SNP read depth, did not influence the SNP frequency calls. Based on consistent SNP frequency calls at targeted locations, the algorithmic strain reconstruction for each laboratory-mixed sample had 98.5% accuracy in dominant strains. STIM detected up to 5 strains in field samples from western Kenya and showed declining MOI over time (q < 0.02), from 4.32 strains per infected person in 1996 to 4.01, 3.56 and 3.35 in 2001, 2007 and 2012, and a reduction in the proportion of samples with 5 strains from 57% in 1996 to 18% in 2012. CONCLUSION The combined approach of new multiplex PCRs and NGS, the unique bioinformatics pipeline and STIM could identify 24 barcode SNPs of P. falciparum correctly and consistently. The methodology could be applied to field samples to reliably measure temporal changes in MOI

    A Delivery Network Creation Game

    Full text link
    We study a non-cooperative network creation game where players, represented by nodes, can build edges to other players for a cost of ?, and strive to maintain short paths to other players while minimizing cost. Players incur a penalty of ? for each unreachable node in addition to the charges for constructing edges, and attempt to optimize their accrued cost. The model generalizes previous work such that it provides an abstraction for describing the synthesis of various economic networks. For instance, in a network for the transportation of goods between facilities, the ? cost parameter can intuitively be viewed as the price of establishing a route between facilities, and ? is the value (or incentive) to have access to goods at a remote site. We observe sharp changes in optima as the ? and ? parameters vary. Furthermore, we bound the price of anarchy of the game for all values of ?, ? and n, where n is the number of players. We identify surprising properties in the structure of Nash equilibria. We show that not only do there exist zero-incentive strict Nash equilibria of arbitrarily large size but they also exhibit properties such as constant diameter and resilience to any single-edge deletion. Lastly, we identify the ?rst super-constant lower bound on the price of anarchy in this line of research and prove that it is persistent even if we incorporate in our model coalitions of size up to o(sqrt(n))

    Affinity In Distributed Systems

    Full text link
    In this dissertation we address shortcomings of two important group communication layers, IP Multicast and gossip based message dissemination, both of which have scalability issues when the number of groups grows. We propose a transparent and backward-compatible layer called Dr. Multicast to allow data center administrators to enable IPMC for large numbers of groups without causing stability issues. Dr. Multicast optimizes IPMC resources by grouping together similar groups in terms of membership to minimize redundant transmissions as well as cost of filtering unwanted messages. We then argue that when nodes belong to multiple groups, gossip based communication loses its appealing property of using fixed amount of bandwidth. We propose a platform called GO (for Gossip Objects) that bounds the node’s bandwidth use to a customizable limit, prohibiting applications from joining groups that would cause the limit to be exceeded. Both systems incorporate optimizations that are based on group similarity or affinity. We explore group affinity in real data-sets from social networks and a trace from an industrial setting. We present new models to characterize overlaps between groups, and discuss our results in the context of Dr. Multicast and GO. The chapters on Dr. Multicast and GO are self-contained, extended versions of papers that appeared respectively in the ACM Hot Topics in Networks (HotNets) Workshop 2008 [85] and the International Peer-to-Peer (P2P) Conference 2009 [87]
    corecore